33 research outputs found
Music Genre Classification Using Spectral Analysis and Sparse Representation of the Signals
In this paper, we proposed a robust music genre classification method based
on a sparse FFT based feature extraction method which extracted with
discriminating power of spectral analysis of non-stationary audio signals, and
the capability of sparse representation based classifiers. Feature extraction
method combines two sets of features namely short-term features (extracted from
windowed signals) and long-term features (extracted from combination of
extracted short-time features). Experimental results demonstrate that the
proposed feature extraction method leads to a sparse representation of audio
signals. As a result, a significant reduction in the dimensionality of the
signals is achieved. The extracted features are then fed into a sparse
representation based classifier (SRC). Our experimental results on the GTZAN
database demonstrate that the proposed method outperforms the other state of
the art SRC approaches. Moreover, the computational efficiency of the proposed
method is better than that of the other Compressive Sampling (CS)-based
classifiers
3D Video Quality Assessment
A key factor in designing 3D systems is to understand how different visual
cues and distortions affect the perceptual quality of 3D video. The ultimate
way to assess video quality is through subjective tests. However, subjective
evaluation is time consuming, expensive, and in most cases not even possible.
An alternative solution is objective quality metrics, which attempt to model
the Human Visual System (HVS) in order to assess the perceptual quality. The
potential of 3D technology to significantly improve the immersiveness of video
content has been hampered by the difficulty of objectively assessing Quality of
Experience (QoE). A no-reference (NR) objective 3D quality metric, which could
help determine capturing parameters and improve playback perceptual quality,
would be welcomed by camera and display manufactures. Network providers would
embrace a full-reference (FR) 3D quality metric, as they could use it to ensure
efficient QoE-based resource management during compression and Quality of
Service (QoS) during transmission.Comment: PhD Thesis, UBC, 201
Introducing A Public Stereoscopic 3D High Dynamic Range (SHDR) Video Database
High Dynamic Range (HDR) displays and cameras are paving their ways through
the consumer market at a rapid growth rate. Thanks to TV and camera
manufacturers, HDR systems are now becoming available commercially to end
users. This is taking place only a few years after the blooming of 3D video
technologies. MPEG/ITU are also actively working towards the standardization of
these technologies. However, preliminary research efforts in these video
technologies are hammered by the lack of sufficient experimental data. In this
paper, we introduce a Stereoscopic 3D HDR (SHDR) database of videos that is
made publicly available to the research community. We explain the procedure
taken to capture, calibrate, and post-process the videos. In addition, we
provide insights on potential use-cases, challenges, and research
opportunities, implied by the combination of higher dynamic range of the HDR
aspect, and depth impression of the 3D aspect
3D Video Quality Metric for 3D Video Compression
As the evolution of multiview display technology is bringing glasses-free
3DTV closer to reality, MPEG and VCEG are preparing an extension to HEVC to
encode multiview video content. View synthesis in the current version of the 3D
video codec is performed using PSNR as a quality metric measure. In this paper,
we propose a full- reference Human-Visual-System based 3D video quality metric
to be used in multiview encoding as an alternative to PSNR. Performance of our
metric is tested in a 2-view case scenario. The quality of the compressed
stereo pair, formed from a decoded view and a synthesized view, is evaluated at
the encoder side. The performance is verified through a series of subjective
tests and compared with that of PSNR, SSIM, MS-SSIM, VIFp, and VQM metrics.
Experimental results showed that our 3D quality metric has the highest
correlation with Mean Opinion Scores (MOS) compared to the other tested
metrics
ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11 - JCT3V-C0032: A human visual system based 3D video quality metric
This contribution proposes a full-reference Human-Visual-System based 3D
video quality metric. In this report, the presented metric is used to evaluate
the quality of compressed stereo pair formed from a decoded view and a
synthesized view. The performance of the proposed metric is verified through a
series of subjective tests and compared with that of PSNR, SSIM, MS-SSIM, VIFp,
and VQM metrics. The experimental results show that HV3D has the highest
correlation with Mean Opinion Scores (MOS) compared to other tested metrics.Comment: arXiv admin note: substantial text overlap with arXiv:1803.04624,
arXiv:1803.0462
Effect of High Frame Rates on 3D Video Quality of Experience
In this paper, we study the effect of 3D videos with increased frame rates on
the viewers quality of experience. We performed a series of subjective tests to
seek the subjects preferences among videos of the same scene at four different
frame rates: 24, 30, 48, and 60 frames per second (fps). Results revealed that
subjects clearly prefer higher frame rates. In particular, Mean Opinion Score
(MOS) values associated with the 60 fps 3D videos were 55% greater than MOS
values of the 24 fps 3D videos
An Efficient Human Visual System Based Quality Metric for 3D Video
Stereoscopic video technologies have been introduced to the consumer market
in the past few years. A key factor in designing a 3D system is to understand
how different visual cues and distortions affect the perceptual quality of
stereoscopic video. The ultimate way to assess 3D video quality is through
subjective tests. However, subjective evaluation is time consuming, expensive,
and in some cases not possible. The other solution is developing objective
quality metrics, which attempt to model the Human Visual System (HVS) in order
to assess perceptual quality. Although several 2D quality metrics have been
proposed for still images and videos, in the case of 3D efforts are only at the
initial stages. In this paper, we propose a new full-reference quality metric
for 3D content. Our method mimics HVS by fusing information of both the left
and right views to construct the cyclopean view, as well as taking to account
the sensitivity of HVS to contrast and the disparity of the views. In addition,
a temporal pooling strategy is utilized to address the effect of temporal
variations of the quality in the video. Performance evaluations showed that our
3D quality metric quantifies quality degradation caused by several
representative types of distortions very accurately, with Pearson correlation
coefficient of 90.8 %, a competitive performance compared to the
state-of-the-art 3D quality metrics
3D Video Quality Metric for Mobile Applications
In this paper, we propose a new full-reference quality metric for mobile 3D
content. Our method is modeled around the Human Visual System, fusing the
information of both left and right channels, considering color components, the
cyclopean views of the two videos and disparity. Our method is assessing the
quality of 3D videos displayed on a mobile 3DTV, taking into account the effect
of resolution, distance from the viewers eyes, and dimensions of the mobile
display. Performance evaluations showed that our mobile 3D quality metric
monitors the degradation of quality caused by several representative types of
distortion with 82 percent correlation with results of subjective tests, an
accuracy much better than that of the state of the art mobile 3D quality
metric.Comment: arXiv admin note: substantial text overlap with arXiv:1803.04624;
text overlap with arXiv:1803.04832 and arXiv:1803.0483
A Learning-Based Visual Saliency Fusion Model for High Dynamic Range Video (LBVS-HDR)
Saliency prediction for Standard Dynamic Range (SDR) videos has been well
explored in the last decade. However, limited studies are available on High
Dynamic Range (HDR) Visual Attention Models (VAMs). Considering that the
characteristic of HDR content in terms of dynamic range and color gamut is
quite different than those of SDR content, it is essential to identify the
importance of different saliency attributes of HDR videos for designing a VAM
and understand how to combine these features. To this end we propose a
learning-based visual saliency fusion method for HDR content (LVBS-HDR) to
combine various visual saliency features. In our approach various conspicuity
maps are extracted from HDR data, and then for fusing conspicuity maps, a
Random Forests algorithm is used to train a model based on the collected data
from an eye-tracking experiment. Performance evaluations demonstrate the
superiority of the proposed fusion method against other existing fusion
methods
Compression of High Dynamic Range Video Using the HEVC and H.264/AVC Standards
The existing video coding standards such as H.264/AVC and High Efficiency
Video Coding (HEVC) have been designed based on the statistical properties of
Low Dynamic Range (LDR) videos and are not accustomed to the characteristics of
High Dynamic Range (HDR) content. In this study, we investigate the performance
of the latest LDR video compression standard, HEVC, as well as the recent
widely commercially used video compression standard, H.264/AVC, on HDR content.
Subjective evaluations of results on an HDR display show that viewers clearly
prefer the videos coded via an HEVC-based encoder to the ones encoded using an
H.264/AVC encoder. In particular, HEVC outperforms H.264/AVC by an average of
10.18% in terms of mean opinion score and 25.08% in terms of bit rate savings